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1.  Introduction 


If  VLSI  is  an  adequate  technology  to  implement  highly  concurrent  computations  [7], 
it  should  be  possible  to  apply  to  VLSI  the  already  well-established  design  methods  for  dis¬ 
tributed  programming.  Ideally,  a  distributed  computation  should  be  described  in  a  notation 
that  can  be  compiled  into  a  VLSI-circuit  as  well  into  code  for  a  stored-program  computer. 
The  method  described  in  this  paper  is  a  step  in  that  direction.  At  the  moment,  the  term 
“compiling”  means  a  “systematic,  semantics-preserving  transformation” .  The  ultimate  goal 
of  the  transformation  being  carried  out  automatically  has  not  yet  been  achieved,  although 
we  believe  that  it  is  not  remote. 

In  the  method  we  propose,  the  computation  is  initially  described  as  a  set  of  communi¬ 
cating  processes  in  the  notation  of  [3],  which  is  somewhat  similar  to  C.A.R.  Hoare’s  CSP  [2]. 
This  first  description  is  the  reference  solution,  which  has  to  be  proved  correct.  The  program 
is  then  compiled  into  a  delay-insensitive  circuit  by  applying  a  series  of  semantics-preserving 
transformations.  Hence  the  circuit  obtained  is  correct  by  construction:  all  semantic  prop¬ 
erties  that  can  be  proved  of  the  program  hold  for  the  circuit  as  well. 

Following  [11],  a  circuit  is  called  delay-insensitive  when  its  correct  operation  is  inde¬ 
pendent  of  any  assumption  on  delays  in  operators  and  wires,  except  that  the  delays  are 
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finite.  Consequently,  such  circuits  do  not  use  a  clock  signal:  sequencing  is  enforced  entirely 
by  communication  mechanisms.  Delay-insensitive  circuits  have  been  known  and  used  for 
their  elegance,  versatility,  and  robustness,  which  result  from  the  ideal  separation  of  concerns 
they  provide  between  the  mathematical  and  physical  aspects  of  circuit  design. 

The  first  modem  survey  on  the  topic  is  [10],  where  such  circuits  are  called  self-timed.  A 
different  approach — the  macro-module  approach — is  described  in  [8].  Closer  to  our  method 
is  the  recent  work  at  Eindhoven  University  of  Technology,  a  good  survey  of  which  is  [9]. 

A  circuit  is  a  network  of  elementary  operators  (and,  or,  C-element,  arbiter,  synchro¬ 
nizer,  wire,  fork).  The  specification  of  an  operator  is  a  so-called  production  rule  set,  where 
a  production  rule  is  a  “weaker”  form  of  guarded  command,  and  a  production  rule  set  a 
“weaker”  form  of  repetition.  The  compilation  relies  essentially  on  the  four-phase  (also  called 
four-cycle)  handshaking  expansion  of  the  communications.  After  expansion,  the  program 
of  each  process  is  compiled  into  a  production  rule  set  from  which  all  explicit  sequencing  has 
been  removed.  By  matching  those  production  rules  to  those  describing  the  operators,  the 
programs  are  identified  with  networks  of  operators. 

The  method  has  already  been  applied  to  a  whole  spectrum  of  problems,  some  of  them, 
such  as  distributed  mutual  exclusion  [4],  and  fair  arbitration  [5],  being  quite  difficult.  The 
results  are  beyond  our  original  expectations.  For  many  circuits,  especially  complex  ones, 
the  compiled  circuits  are  superior  to  their  “hand-designed”  counterparts,  which  are  often 
more  complex  and  not  entirely  delay-insensitive. 

We  first  present  the  program  notation  and  the  VLSI  operators  that  constitute  the 
“object  code” .  We  then  describe  the  four  steps  of  the  compilation  and  illustrate  the  method 
with  a  number  of  simple  examples. 

2.  The  program  notation 
Sequential  part 

For  the  sequential  part  of  the  algorithm,  we  use  a  subset  of  Edsger  W.  Dijkstra’s  guarded 
command  language  [l],  with  a  slightly  different  syntax.  In  this  introductory  paper  we  give 
only  a  very  informal  definition  of  the  semantics  of  the  constructs  used. 

i)  b  f  stands  for  b  :=  true,  b  j  stands  for  b  :=  false. 

ii)  The  execution  of  the  selection  command  [Gi  — +  Si  \  . . .  |  Gn  — ►  £n],  where  Gi  through 
Gn  Eire  Boolean  expressions,  and  Si  through  Sn  are  program  parts,  (G,  is  called  a 
“guard”,  and  G,-  — »  S,-  a  “guarded  command”)  amounts  to  the  execution  of  an  arbitrary 
S{  for  which  G,-  holds.  If  ->(Gi  V  ...  V  Gn)  holds,  the  execution  of  the  command  is 
suspended  until  (Gi  V  ...  V  Gn)  holds,  v 

iii)  For  atomic  actions  x  and  y,  “x,y”  stands  for  the  execution  of  x  and  y  in  any  order. 

iv)  [G]  where  G  is  a  Boolean,  stands  for  [G  — ►  skip],  and  thus  for  “wait  until  G  holds”. 
(Hence,  “[G];  5” and  [G  —*  5]  are  equivalent.) 

v)  *[S]  stands  for  “repeat  S  forever”. 

vi)  From  ii)  and  iii),  the  operational  description  of  the  statement 

*[[Gi  — ►  Si  |  ...  |  Gn  — ►  £„]]  ia  “repeat  forever:  wait  until  some  Gt-  holds;  execute  an 
Si  for  which  G,-  holds” . 
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Communicating  processes 

A  concurrent  computation  is  described  as  a  set  of  processes  composed  by  the  usual  parallel 
composition  operator  j|.  Processes  communicate  with  each  other  by  communication  actions 
on  channel;  they  do  not  share  variables.  When  no  messages  are  transmitted,  communication 
on  a  channel  is  reduced  to  synchronization  signals.  The  name  of  the  channel  is  then  sufficient 
for  identifying  a  communication  action. 

If  two  processes  pi  and  p2  share  a  channel  named  A  in  pi  and  Y  in  p2,  at  any  time 
the  number  of  completed  A-actions  in  pi  equals  the  number  of  completed  F-actions  in 
p2.  In  other  words,  the  completion  of  the  n-th  A- action  “coincides”  with  the  completion 
of  the  n-th  T-action.  If,  for  example,  pi  reaches  the  n-th  A-action  before  p2  reaches  the 
n-th  Y-action,  the  completion  of  A  is  suspended  until  p2  reaches  Y .  The  A-action  is  then 
said  to  be  pending.  When  thereafter  p2  reaches  Y,  both  A  and  Y  are  completed.  The 
predicate  “A  is  pending”  is  denoted  qA.  If,  for  an  arbitrary  command  A,  c A  denotes  the 
number  of  completed  A-actions,  the  semantics  of  a  pair  (A,  T)  of  communication  commands 
is  expressed  by  the  two  axioms: 


cA  =  c  Y  (Al) 

-■qA  V  -iqT.  (A2) 


Probe 

Instead  of  the  usual  selection  mechanism  by  which  a  set  of  pending  communication  actions 
can  be  selected  for  execution,  we  provide  a  general  Boolean  command  on  channels,  called 
the  probe.  The  definition  of  the  probe  given  in  [3]  states  that  in  process  pi,  the  probe 
command  A  has  the  same  value  as  q Y.  Here,  we  use  a  weaker  definition,  namely: 

A  =>  qY 
q Y  =>  oA, 

where  oP  means  P  holds  eventually. 

Hence  the  guarded  command  A  — +  A  guarantees  that  the  A-action  is  not  suspended. 
And  a  construct  of  the  form  [X  —>  X  \Y  — >  Y]  can  be  used  for  selection.  (For  a  more 
rigorous  definition  of  the  communication  mechanism  and  the  probe,  see  [3] .) 

3.  The  “Object  Code” 

The  set  of  operators  with  which  we  want  to  build  our  circuits  is  not  unique.  In  this 
introduction,  we  will  use  the  simple  set  consisting  of  and,  or,  C-element,  wire,  and  fork. 
We  believe  that  this  simple  set  extended  with  an  arbiter  and  a  synchronizer  is  sufficient 
for  compiling  any  program.  Each  operator  is  described  by  a  set  of  production  rules.  A 
production  rule  is  similar  to  a  guarded  command,  and  we  shall  therefore  use  a  similar 
syntax.  There  are,  however,  important  semantic  differences.  Consider  the  production  rule 
G  h-*  S: 

•  S  is  either  a  simple  assignment  or  of  the  form  “sl,s2”  where  si  and  s2  are  each  a 
simple  assignment. 
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•  If  G  holds,  the  correct  execution  of  S'  is  guaranteed  only  if  G  remains  invariantly  true 
until  the  completion  of  S.  We  say  that  G  must  be  stable. 

•  Unlike  the  guarded  commands  of  a  selection  or  a  repetition,  the  mutual  exclusion 
among  the  different  production  rules  of  a  set  is  not  guaranteed  automatically.  It  has 
to  be  enforced  by  the  semantics  of  the  program. 

•  If  stability  of  the  guards  and  mutual  exclusion  among  guards  are  guaranteed,  the 
production  rule  set  PRS  is  semantically  equivalent  to  the  repetition  *[[GC5]],  where 
GCS  is  the  guarded  command  set  syntactically  identical  to  PRS. 

The  description  of  the  five  operators  used  in  this  paper  in  terms  of  their  production 
rules  and  their  logic  symbols  are  as  follows. 


The  C-element: 


The  “and”: 


The 


■or" 


The  wire: 


(x,y)  C_z=  lAj/H+zl 

-ix  A  -iy  t— ►  z  J.. 


(x, y)  A  z  =  xAywz| 

-ix  V  -iy  t— ►  z  j. 


(x, y)  V  x  =  xVyi-»2 T 

-ix  A  —<y  i — ►  z  J.. 


x  w  y  =  x  y  | 
->X  y  j. 


The  fork: 

^ - • - 

x  /  (y,  z)  =  x  ■-»  y  j,  *  T 

-nX^yl.zj.  & 

V 

Any  input  or  output  variable  of  an  operator  may  be  negated.  In  particular,  a  wire  with 
its  input  or  its  output  negated — but  not  both — is  an  inverter.  A  negated  input  or  output 
is  represented  in  the  figures  by  a  small  circle  on  the  corresponding  line. 


y 


4.  The  Compilation  Method 

Process  Decomposition 

The  first  step  of  the  compilation,  c silled  “process  decomposition”,  consists  in  replacing  a 
process  by  several  semantically  equivalent  processes.  The  purpose  of  the  decomposition 
is  to  obtain  a  process  representation  of  the  program  in  which  the  right-hand  side  of  each 
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guarded  command  is  a  straight-line  program,  i.e.  consists  only  of  simple  assignments  and 
communication  commands,  composed  by  semi-colons  and  commas. 

Decomposition  rule:  A  process  P  containing  an  arbitrary  program  part  S  is  semantically 
equivalent  to  two  processes  PI  and  P 2,  where  PI  is  derived  from  P  by  replacing  S  by  a 
communication  action  C  on  the  newly  introduced  channel  ( C ,  D)  between  PI  and  P2,  and 
P2~*[[D-+S;D}}. 

Observe  that  the  above  decomposition  does  not  introduce  concurrency.  Although  PI 
and  P2  are  potentially  concurrent  processes,  they  are  never  active  concurrently:  P2  is 
activated  from  PI,  much  as  a  procedure  or  a  coroutine  would  be.  The  only  purpose  of  this 
transformation  is  to  simplify  the  structure  of  each  command.  As  an  example,  consider  the 
process: 


P  =  *[[...A;[B1-+S1|B2-*S2];...]). 

Applying  the  decomposition  rule,  P  is  replaced  by  the  two  processes  PI  and  P2. 
Channel  ( C ,  D)  is  introduced  between  PI  and  P2. 

PI  =  *[[... A;  C;...]] 

P2  =  *[[D  A  Pi  — *  Si\  D 
\DAB2-*S2-,D 
]]■ 

Observe  that  the  newly  created  processes  PI  and  P2  may  share  variables.  Since  the 
processes  are  never  active  concurrently,  there  is  no  conflicting  access  to  the  shared  variables. 
Process  decomposition  is  applied  repeatedly  until  the  right-hand  side  of  each  guarded  com¬ 
mand  is  a  straight-line  program. 

Handshaking  Expansion 

The  implementation  of  communication,  called  “handshaking  expansion”,  replaces  each 
channel  by  a  pair  of  wire-operators  and  each  communication  action  by  its  implementation. 
Channel  ( X,Y )  is  implemented  by  the  two  wires  (xo  w  yi)  and  ( yo  w  xi). 

If  X  belongs  to  process  pi  and  Y  to  process  p2,  xo  and  xi  belong  to  pi,  and  yo  and 
yi  belong  to  p2.  Initially,  xo,  xi,  yo,  and  yi — which  we  will  call  the  “handshaking  variables 
of  (X,Y)” — are  false.  Assume  that  the  program  has  been  proved  to  be  deadlock-free  and 
that  we  can  identify  a  pair  of  matching  actions  X  and  Y  in  pi  and  p2  respectively.  We 
replace  X  and  Y  by  the  sequences  Ux  and  Uv  respectively,  with: 

Ux  =  xo]]  [xi] 

Uy  =  [y«] ;  yof  • 

The  formal  proof  that  Ux  and  Uv  fulfil  axioms  A1  and  A2  is  omitted.  The  following  is 
an  informal  argument  that  relies  on  a  definition  of  completion  of  an  action  different  from 
the  usual  one.  Since  the  argument  is  not  essential  to  the  comprehension  of  the  method,  it 
may  be  skipped  at  first  reading. 
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Assume  that  we  know  what  the  initiation  and  termination  of  an  atomic  action  mean. 
A  non-atomic  action  is  initiated  when  its  first  atomic  action  is  initiated.  A  non-atomic 
action  is  terminated  when  its  last  atomic  action  is  terminated. 

A  non-atomic  action  is  said  to  be  completed  when  it  is  initiated  and  it  is  guaranteed 
to  terminate. 

(An  atomic  action  is  completed  when  it  is  terminated.)  Between  initiation  and  com¬ 
pletion,  an  action  is  suspended. 

Obviously,  Ux  and  Uy  are  guaranteed  to  terminate  if  and  only  if  they  are  both  initiated, 
which  establishes  A1  and  A2. 

It  is  essential  to  observe  that  these  definitions  of  completion  and  suspension  are  valid 
because  they  satisfy  the  semantic  properties  of  completion  and  suspension  that  are  used  in 
correctness  arguments,  namely: 


{cX  =  x}  X{cX  =  *+l} 
qX  =>  pre(X) 

where  pre(X)  is  any  precondition  of  AT  in  terms  of  the  program  variables  and  auxiliary 
program  variables. 

(This  completes  the  argument.) 

Unfortunately,  when  the  communication  terminates,  all  handshaking  variables  are 
true.  Hence,  we  cannot  implement  the  next  communication  with  Ux  and  Uv.  However, 
the  complementary  implementation  can  be  used  for  the  next  matching  pair,  namely: 

Dx  =  xo  [-.art] 

Dy  =  [->!/*] ;  yoj. 

The  solution  consisting  in  alternating  Ux  and  Dx  as  an  implementation  of  X,  and  Uv 
and  Dy  as  an  implementation  of  Y  is  essentially  the  so-called  “two-phase  handshaking” , 
or  “two-cycle  signaling”.  However,  it  is  in  general  not  possible  to  determine  syntactically 
which  X-  or  Y-actions  are  following  each  other  in  an  execution.  In  general,  two-phase 
handshaking  implementations  require  testing  the  current  value  of  the  variables.  In  this 
paper,  we  shall  use  a  simpler  but  less  efficient  solution  known  as  “four-phase  handshaking” , 
or  “four-cycle  signaling”. 

In  a  four-phase  handshaking  protocol,  all  X-actions  are  implemented  as  “Ux\  Dx”  and 
all  Y-actions  as  uUy;Dy” .  Observe  that  the  D-parts  in  X  and  Y  introduce  an  extra  commu¬ 
nication  between  the  two  processes  whose  only  purpose  is  to  reset  all  variables  to  false.  The 
synchronization  introduced  by  this  extra  communication  is  unnoticeable  since  the  immedi¬ 
ately  preceding  communication  implemented  by  Ux  and  Uy  sees  to  it  that  both  processes 
reach  a  matching  Dx  and  Dy  “at  the  same  time” . 

Both  protocols  have  the  property  that  for  a  matching  pair  (X,  Y)  of  actions,  the  im¬ 
plementation  is  not  symmetrical  in  X  and  Y.  One  action  is  called  active  and  the  other  one 
passive.  The  four-phase  implementation  with  X  active  and  Y  passive  is: 


X  =  xoti  [xt];  XO  j;  [ — >sp»] 


(1) 


Y  =  [y*];  yot;  by»1;  yo  i 


(2) 


When  no  action  of  a  matching  pair  is  probed,  the  choice  of  which  one  should  be  active 
and  which  one  passive  is  arbitrary,  but  a  choice  has  to  be  made.  The  choice  can  be  important 
for  the  composition  of  identical  circuits.  A  simple  rule  is  that  for  a  given  channel  (X,  Y), 
all  actions  at  one  side  are  active  and  all  actions  at  the  other  side  passive.  If  X  is  used, 
all  X-actions  are  passive — with  the  obvious  restriction  that  Y  cannot  be  used  in  the  same 
program. 

The  implementation  of  the  probe  is  simply: 


X  =  x« 
Y~yi 


(3) 


Given  our  definition  of  suspension,  the  proof  that  this  implementation  of  the  probe  fulfils 
the  definition  of  Section  2  is  straightforward  and  is  omitted. 

A  probed  communication  action  X  —>  ...  X  is  implemented: 


xi  — »...xo|;  [—»*»];  xoj. 


Basic  properties 

The  following  properties  of  the  handshaking  protocol  play  an  important  role  in  the  compi¬ 
lation  method. 

Property  1:  For  the  pair  of  wires  ( xowyi )  and  {yowxi),  used  together  as  in  (1)  and  (2), 
and  all  variables  false  initially,  the  following  sequence  of  transitions  is  guaranteed  to  occur 
if  the  system  is  deadlock-free: 

*[xo T;  y«'t;  y»T;  xi  f;  xoj;  y»j;  yoj;  xt|].  (4) 

Hence,  the  following  postconditions  hold: 

xo | {ozi} 

xo  |{o->zi}  (5) 

yof{o-.yt'} 

Property  2:  Consider  the  handshaking  expansion  of  a  program  p  according  to  (1),  (2), 
and  (3).  Provided  that  the  cyclic  order  of  the  four  handshaking  actions  of  a  communication 
command  is  respected,  the  last  two  actions  of  this  command — the  two  actions  of  Dx  or  Dy — 
can  be  inserted  at  any  place  in  p  without  invalidating  the  semantics  of  the  communication 
involved.  However,  modifying  the  order  of  these  two  actions  relatively  to  other  actions  of  p 
may  introduce  deadlock. 

Property  2  is  a  direct  consequence  of  the  way  in  which  we  have  introduced  the  sequences 
Dx  and  Dy.  We  will  see  examples  of  how  to  use  Property  2.  In  this  paper,  we  will  ignore 
the  deadlock  issue  when  we  re-order  handshaking  actions. 
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First  example:  stack  element 

Consider  the  simple  process  S,  which  we  call  a“stack  element”: 

S  =  *[!-*;  L], 

where  L  and  R  are  channels.  Since  L  is  probed,  it  must  be  passive,  and  if  we  want  to 
compose  5-processes  together,  R  must  be  active,  since  it  will  match  a  passive  L.  The 
handshaking  expansion  gives: 

*[[*»'];  ™  Ti  H;  [-•«];  M;  H*1;  kl]*  (6) 

5.  Production-rule  expansion 

The  next  step  is  to  compile  the  handshaking  expansion  of  the  program  into  a  set  of  pro¬ 
duction  rules  from  which  all  explicit  sequencing  has  been  removed.  By  matching  those 
production  rules  to  those  describing  the  semantics  of  operators,  the  programs  can  be  iden¬ 
tified  with  networks  of  operators.  We  use  the  compilation  of  S  to  illustrate  the  different 
steps  of  the  expansion. 

We  start  with  the  production  rule  set  syntactically  derived  from  the  program.  In  the 
case  of  5,  it  is  the  set  derived  from  (6),  namely: 

li  t-*  ro  | 
ri  i-c  ro  j. 

-in'  i — ►  /o  T 
-i li  !-*■  lo  | . 

The  execution  of  a  production  rule  is  called  effective  if  it  changes  the  value  of  a  variable. 
Otherwise,  it  is  called  vacuous.  We  ignore  vacuous  executions  of  production  rules. 

For  each  guarded  command  of  the  program,  the  production  rule  set  representation  is 
semantically  equivalent  to  the  program  representation  if  and  only  if  the  order  of  execution 
of  effective  production  rules  is  the  same  as  the  order  of  the  corresponding  transitions  in  the 
program — we  call  it  the  program  order.  (As  a  clue  to  the  reader  we  list  the  production  rules 
of  a  set  in  program  order.) 

In  general,  we  have  to  strengthen  the  guards  of  some  rules  to  enforce  execution  in  pro¬ 
gram  order.  This  is  the  case  in  our  example:  Since  -irt  holds  initially,  the  third  production 
rule  can  be  executed  first.  It  is  also  true  for  the  fourth  production  rule;  but  the  execution 
of  the  fourth  rule  in  the  initial  state  is  vacuous. 

Because  all  handshaking  variables  of  R  are  back  to  false  when  R  is  completed,  we 
cannot  find  a  guard  for  the  transition  lo  t-  (Hence,  the  transitions  following  a  semi-colon 
that  can  be  identified  with  a  semi-colon  of  the  original  program  are  likely  to  be  difficult  to 
deal  with.) 

Direct  implementation 

In  order  to  define  uniquely  the  state  in  which  the  transition  lo  t  is  to  take  place,  the  first 
technique  consists  in  introducing  a  state  variable,  say  x,  initially  false.  5  becomes 
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*[1*1;  ro t;  [r*];  x|;  [*3;  «»i;  h«1;  *>t;  H*1;  *i;  h*];  M].  (7) 

Now,  the  production-rule  expansion  can  be  performed: 

— 'X  A  /*  H-+  ro  f  {ort}  (SI) 

rt  !— ►  x  j{x}  (52) 

x  1-4  ro  J,{x  A  o-mf}  (S3) 

x  A  -if*  i-4  /o|  {o— i/i)  (S4) 

-’/iwx|{nx}  (S5) 

-ix  i -*lo[.  (S6) 


(Why  is  the  conjunct  ->x  necessary  in  the  first  rule?)  Using  the  postconditions  indicated 
between  braces — these  conditions  rely  on  (5) — ,  it  is  easy  to  verify  that  the  production  rules 
of  the  set  are  executed  in  program  order.  Hence,  the  execution  of  the  production  rule  set 
is  equivalent  to  the  execution  of  (7). 

Re-ordering  implementation 

Another  way  to  find  a  valid  guard  for  /of  is  to  use  Property  2,  to  re-order  the  actions 
of  (6).  For  instance,  we  can  postpone  the  second  half  of  the  handshaking  expansion  of  S 
— i.e.,  the  sequence  ro  l;  [— >r*] — until  after  [—>/*].  We  get: 

*P];  ro  T;  [rt] ;  lo]\  [--/*];  ro  j;  [->«];  lo  |].  (8) 

The  syntactic  production  rule  expansion  is  already  “program  ordered” : 

It  (->•  ro  | 
rt  H4  lo'l 
—ili  >—>rol 
-i rt  i-4  lo  l . 


6.  Operator  reduction 


The  last  step  of  the  compilation,  called  operator  redaction ,  consists  in  identifying  sets 
of  production  rules  in  the  program  with  sets  of  production  rules  describing  operators.  The 
program  cam  then  be  identified  with  a  set  of  operators.  We  group  pairs  of  production  rules 
that  modify  the  same  variable. 

If  a  given  group  cannot  be  directly  identified  with  the  production  rule  set  of  an  operator, 
we  perform  on  this  group  a  last  transformation  called  symmetrization:  we  transform  the 
guards  of  the  production  rules — again  under  invariance  of  the  semantics — so  as  to  make 
them  “look  like”  the  guards  of  operators.  In  case  a  guard  contains  too  many  variables, 
this  step  may  also  involve  decomposing  a  production  rule  into  several  production  rules  by 
introducing  new  internal  variables. 

Consider  51  and  53.  No  operator  corresponds  to  these  rules.  But,  if  we  replace  x  by 
-ili  v  x  in  53,  the  value  of  the  guard  of  53  is  not  changed  since  li  holds  as  precondition 
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of  53,  and  now  the  two  production  rules  represent  the  operator  (->x,  it)  A  ro  .  Since  we 
have  weakened  the  guard  of  53,  we  have  to  check  that  we  have  not  enlarged  the  set  of 
states  in  which  53  can  be  effectively  executed.  No  such  state  has  been  added,  hence  the 
transformation  is  safe. 

In  the  case  of  52  and  55,  no  guard  can  be  weakened.  We  therefore  strengthen  both  of 
them  as 

ri  A  li  x  f 
-i  ri  A  -i /«  i— ►  x  J., 

which  corresponds  to  the  C-element  (ri,  li)  C_  x.  Observe  that  strengthening  the  guards 
in  this  way  is  always  possible  since  the  guards  are  mutually  exclusive  by  construction. 
Hence  it  is  always  possible  to  implement  a  pair  of  guards  with  a  C-element.  Why  then 
bother  about  weakening  the  guards?  The  answer  is  that  introducing  a  disjunction  is  the 
only  transformation  leading  to  combinatorial  operators — and,  or — ,  which  are  usually  less 
“expensive”  than  C-elements — a  C-element  is  a  state-holding  operator. 

For  the  direct  implementation  of  5,  the  symmetrization  of  the  set  51  through  56  gives: 


-ix  A  li  •-»  ro  | 

(51) 

ri  A  li  i~»  x  | 

(52) 

— >/*  V  x  i — ►  ro  1, 

(53) 

x  A  -iri  i— >  lo  | 

(54) 

ri  A  -i li  x  J. 

(55) 

ri  V  -ix  i -*  lo  l . 

(56) 

The  identification  with  operators  is  now  straightforward. 

(51,  53)  corresponds  to  (-ix,  li)  A  ro. 

(52,  55)  corresponds  to  (/»,«')  C  x. 

(54,  56)  corresponds  to  (x,  -in')  A  lo. 

Isochronic  forks 

In  the  previous  operator  reduction,  li  is  input  to  the  C-element  (/*’,  ri)  C_  x,  and  to  the 
and-operator  (li,  ->x)  A  ro.  Formally,  in  order  to  compose  the  circuit  we  have  to  introduce 
the  fork  li  /  (11,12)  and  replace  li  by  /I  in  the  C-element  and  by  12  in  the  and-operator. 

Since  the  fork  is  delay-insensitive,  /I  and  12  are  not  guaranteed  to  have  the  same  value 
in  all  states,  whereas  the  two  operators  are  constructed  with  the  same  input  variable  li. 
We  solve  this  problem  by  making  a  simplifying  assumption:  we  assume  that  the  forks  used 
to  connect  operators  inside  a  process  are  ieochronic,  i.e.  the  delays  in  these  forks  are  short 
enough,  compared  to  the  delays  in  all  operators  other  than  forks  and  wires,  to  assume  that 
the  two  outputs  of  an  isochronic  fork  have  the  same  value  at  any  time. 

The  resulting  circuit  is  shown  in  Fig.  1. 
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li 


For  the  second  implementation  of  5 — with  re-ordering  of  actions — the  production  rule  set 
can  be  reduced  directly:  the  first  and  third  rules  specify  the  wire  li  tv  ro,  the  second  and 
fourth  rules  specify  the  wire  ri  wlo.  The  circuit  is  shown  in  Fig.  2. 

tl  ro 

- > - 

Lo  ri 

- < - 


-Figure  2- 

Comparing  the  circuits  of  Figs.  1  and  2,  we  observe  that  the  re-ordering  of  handshaking 
actions  leads  to  a  simpler  implementation.  This  observation  is  true  in  general,  although  the 
gain  is  not  always  as  drastic  as  in  this  case.  We  also  observe  that  re-ordering  handshaking 
actions  modifies  the  behavior  of  the  circuit  concerning  its  synchronization  with  its  environ¬ 
ment.  This  is  not  surprising  since  the  second  half  of  a  handshaking  sequence — the  part  that 
we  shift  from  its  place — is  an  extra  synchronization  action.  Placed  just  after  the  first  half, 
this  second  synchronization  has  no  noticeable  effect.  But  its  synchronization  effect  becomes 
noticeable  when  the  action  is  shifted  away  from  the  first  half  of  the  handshaking  sequence. 
Hence  the  choice  to  re-order  actions  is  a  choice  in  favor  of  a  simpler  circuit  at  the  cost  of 
modifying  the  original  synchronization  behavior  of  the  circuit-— in  general  for  the  worse. 

7.  Second  example:  one-place  buffer 

Our  second  example  is  the  simple  “one-place  buffer”  process 


B  =  *[£;  22], 


where  L  and  R  are  two  channels.  The  handshaking  expansion  of  B  gives: 


B  =  *[[/*];  lo  t;  H»]i  lo  ro  f;  [ri];  roj,;  [-.ri]]. 


(9) 
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Here  the  difficult  transition  is  ro|.  In  this  example  we  construct  only  the  solution 
obtained  by  re-ordering  of  actions.  The  construction  of  the  solution  with  introduction  of  a 
state  variable  is  more  difficult  and  is  left  as  an  exercise  to  the  reader.  (It  is  described  in 
[6].)  If  we  postpone  the  second  half  of  the  handshaking  expansion  of  L  until  after  [ri] ,  we 
get: 


*[[/«];  /of;  ro|;  M;  H*];  lo i;  ro|;  [-in*]], 
which  we  can  also  re-order  as: 

*[[-.«'];  [It];  /of;  ro|;  [«];  [->/*];  /o|;  roj.].  (10) 

The  order  between  two  successive  transitions  on  output  variables — like  /o t ;  rot — is  irrele¬ 
vant.  Hence  the  production-rule  expansion  of  (10)  gives: 

-i ri  A  li  —*  /ot,rot 
ri  A  -<li  — *  /oJ.,roj  . 

After  introducing  the  auxiliary  variable  u,  the  production  rule  expansion  is  straightforward : 

((-|ri,/i)  Cu) 

(u/(Jo,ro)). 

The  corresponding  circuit  is  shown  in  Fig.  3. 

Lo  ro 


8.  Message  communication 

So  far,  we  have  only  considered  the  synchronization  aspect  of  the  communication  actions: 
no  message  was  passed.  The  last  two  examples  describe  implementations  of  communications 
that  entail  transmissions  of  messages.  We  consider  the  transmission  of  Boolean  variables 
only;  the  generalization  to  other  types  is  relatively  straightforward. 

Third  example:  Queue  (FIFO)  element 

Queues  (FIFO)  play  an  important  role  in  pipeline  computations  for  increasing  throughput 
when  processing  times  are  variable.  A  queue  consists  of  the  linear  composition  of  a  number 
of  buffer-elements  of  the  type: 
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E  =  *Ii?(x);  «!(*)].  (11) 

(L?(x)  is  an  input  action  assigning  to  internal  variable  x  the  value  received  on  L.  R\{x) 
is  an  output  sending  the  value  of  x  on  channel  R) 

We  are  going  to  implement  the  transmission  of  true  messages  and  of  false  messages 
on  two  independent  channels.  We  shall  construct  a  circuit  for  each  type  of  messages,  and 
then  compose  the  two  circuits.  Such  a  technique  is  called  the  “double-rail”  technique  [10]. 
We  get: 

*[[£*  Rt 

I  Lf  -»  Lf  \  Rf 

]]. 

where  ->Lt  V  ->Lf  holds  at  any  time. 

If  we  let  channels  Lt  and  Lf  share  variable  lo,  and  channels  Rt  and  Rf  share  variable 
rt,  the  handshaking  expansion  gives  the  two  guarded  commands: 

H»i];  M;  ^iT;  [«];  roil;  [-1  rt] 

H*a];  toll  r°2t;  N;  ro2h  [->ri]  (12) 

]]■ 

The  production  rule  expansion  of  (12)  has  to  guarantee  mutual  exclusion  between 
the  two  guarded  commands.  Since  — i/i'i  V  — >f*2  holds  at  any  time,  it  is  easy  to  see  that 
mutual  exclusion  is  guaranteed  if  we  re-order  the  actions  of  each  guarded  command  as  in 
the  implementation  of  B.  We  get: 

A  /*i  — *  /o|,roi  [n‘A-1/1‘1];  /o|,roi| 

|  ->rt  A  /t*2  —» lo  t,  ro2  f ;  [rt  A  — i/ij];  /o|,ro2j  (13) 

]]• 

Since  eeLch  of  the  two  guarded  commands  of  (13)  is  identical  to  (10),  the  circuit  for  (12) 
consists  of  two  copies  of  the  circuit  of  Fig.  3  composed  in  the  obvious  way  so  as  to  share  lo 
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Fourth  example:  single  variable 


Consider  the  following  process  that  provides  read  and  write  access  to  a  simple  Boolean 

variable  x:  __ 

*[[P  ->  P? x 

|Q  Q\x  (14) 

]], 

where  ->P  V  ->Q  holds  at  any  time. 

Again,  according  to  the  double-rail  technique,  each  guarded  command  of  (14)  is  ex¬ 
panded  to  two  guarded  commands.  But  now  the  values  true  and  false  have  to  be  explicitely 
assigned  to  x,  in  the  following  way: 

*[[pt‘i  —*  xVi  M;  pot;  hp*i];  poi 

|P*2  — ►  a: i;  [-<*];  po|;  hpi'i];  po| 

\x  A  qi  — ►  qoi  [-igi];  qox  |  (15) 

H*  A  qi  -*■  qo2  f ;  [->?*']  i  9° 2  i 

]]• 

The  rest  of  the  compilation  is  now  straightforward  and  is  left  as  an  exercise  to  the 
reader.  (Hint:  don’t  forget  to  ensure  mutual  exclusion  between  the  guarded  commands.) 
The  operator  reduction  gives: 

(p»i,  Cx 
(p*i,x)  A  poi 
(pt2,-'i)  A  po2 
(poupo2)  V  po 
(x,  qi)  A  qox 
(~'X)qi)  Aqo2. 

The  circuit  is  represented  in  Fig.6. 
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9.  Conclusion 


We  have  described  a  method  for  implementing  a  high-level  concurrent  algorithm  (a  set  of 
communicating  processes)  as  a  network  of  digital  operators  that  can  be  directly  mapped 
into  a  delay-insensitive  VLSI-circuit.  The  circuit  is  derived  from  the  program  by  a  series  of 
systematic,  semantics- preserving,  transformations  that  we  have  compared  to  compiling. 

Since  the  circuits  are  correct  by  construction,  and  in  particular,  since  the  guards  of  the 
production  rules  are  stable  by  construction,  the  circuits  are  free  from  “hazards”. 

The  choice  between  active  and  passive  implementations  is  usually  clear  from  the  con¬ 
text.  For  instance,  the  choice  to  implement  input  as  passive  and  output  as  active  is  most 
of  the  time  safe.  Furthermore,  in  the  case  the  wrong  choice  has  been  made  and  it  turns  out 
that  two  active  or  two  passive  commands  have  to  be  paired,  an  “adaptor”  process  can  be 
used.  An  adaptor  is  a  one-place  buffer  with  L  and  R  both  active — a  “double-A” — or  both 
passive — a  “double-P”.  A  double-A  is  used  to  pair  two  passive  commands,  a  double-P  to 
pair  two  active  commands. 

The  simplifying  assumption  of  isochronic  forks  is  not  severe,  since  such  a  fork  is  always 
confined  to  a  very  small  circuit  part.  In  fact,  it  is  even  weaker  than  the  usual  isochronic 
assumption  used  in  self-timed  design,  where  a  whole  circuit  part  is  assumed  isochronic.  We 
believe  that  isochronic  forks  can  be  avoided,  but  doing  so  would  complicate  the  circuits 
without  real  advantage  in  return. 

We  also  believe  that  the  basic  sets  of  operators  used  in  this  paper,  extended  with  an 
arbiter  and  a  synchronizer  to  implement  mutual  exclusion  among  independent  commands, 
is  sufficient  for  all  purposes.  (Obviously,  having  both  and  and  or  is  redundant.)  How¬ 
ever,  there  is  no  interest  in  confining  the  designer  to  a  minimal  set  of  operators.  On  the 
contrary,  since  one  of  the  advantages  of  VLSI  is  the  possibility  to  create  operators  at  no 
cost,  introducing  other  operators — like,  e.g.,  and  and  or  with  more  than  two  inputs,  or 
exclusive- or — may  often  simplify  a  circuit  drastically. 

We  have  illustrated  the  method  with  four  simple — sometimes  deceivingly  so — but  char¬ 
acteristic  examples  that  embody  very  standard  control  and  data  structures.  The  method 
has  also  been  tested  on  quite  difficult  examples  like  the  distributed  mutual  exclusion  cir¬ 
cuit  described  in  [4].  In  [5],  we  have  used  the  method  to  solve  an  open  problem:  It  had 
been  conjectured  that  it  is  impossible  to  construct  a  delay-insensitive  fair  arbiter.  We  have 
disproved  the  conjecture  by  constructing  such  an  arbiter  applying  our  method. 

The  most  encouraging  aspect  of  the  method  is  that  it  is  really  a  synthesis  technique:  it 
allows  a  designer  to  construct  solutions  that  he  would  never  have  found  had  he  not  applied 
the  method. 
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